Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Co-training algorithm with combination of active learning and density peak clustering

GONG Yanlu, LYU Jia

Journal of Computer Applications 2019, 39 (8): 2297-2301. DOI: 10.11772/j.issn.1001-9081.2019010075

Abstract （511）

PDF （770KB）（247）

Save

High ambiguity samples are easy to be mislabeled by the co-training algorithm, which would decrease the classifier accuracy, and the useful information hidden in unlabeled data which were added in each iteration is not enough. To solve these problems, a co-training algorithm combined with active learning and density peak clustering was proposed. Before each iteration, the unlabeled samples with high ambiguity were selected and added to the labeled sample set after active labeling, then density peak clustering was used to cluster the unlabeled samples to obtain the density and relative distance of each unlabeled sample. During iteration, the unlabeled samples with higher density and further relative distance were selected to be trained by Naive Bayes (NB) classification algorithm. The processes were iteratively done until the termination condition was satisfied. Mislabeled data recognition problem could be improved by labeling samples with high ambiguity based on active learning algorithm, and the samples reflecting data space structure well could be selected by density peak clustering algorithm. Experimental results on 8 datasets of UCI and the pima dataset of Kaggle show that compared with SSLNBCA (Semi-Supervised Learning combining NB Co-training with Active learning) algorithm, the accuracy of the proposed algorithm is up to 6.67 percentage points, with an average improvement of 1.46 percentage points.

Reference | Related Articles | Metrics